Speech Activity Detection and its Evaluation in Speaker Diarization System
نویسندگان
چکیده
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and non-speech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SAD comparatively better than Gaussian mixture model based SAD.
منابع مشابه
Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition
This paper studies the overlapped speech detection for improving the performance of the summed channel speaker recognition system in NIST Speaker Recognition Evaluation (SRE). The speaker recognition system includes four main modules: voice activity detection, speaker diarization, overlapped speaker detection and speaker recognition. We adopt a GMM based overlapped speaker detection system, by ...
متن کاملThe AMI Speaker Diarization System for NIST RT06s Meeting Data
We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker detection tasks. The speaker diarization systems are based on the TNO and ICSI system submitted for RT05s...
متن کاملSpeaker Diarization: From Broadcast News to Lectures
This paper presents the LIMSI speaker diarization system for lecture data, in the framework of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. This system builds upon the baseline diarization system designed for broadcast news data. The baseline system combines agglomerative clustering based on Bayesian information criterion with a second clustering using state-of-th...
متن کاملUniversité Paris Xi Ufr Scientifique D'orsay Le Grade De Docteur En Sciences De L'université Paris Xi Orsay Sujet : Acoustic-based Speaker Diarization
This thesis presents a work focusing on the topic of speaker diarization for different types of audio recordings, especially including broadcast news (BN) and meetings. The speaker diarization is a relatively recent speech processing technique, but it has attracted strong research efforts due to its great benefit to other speech technologies, such as rich transcription, audio indexing and speak...
متن کاملSpeech overlap detection in a two-pass speaker diarization system
In this paper we present the two-pass speaker diarization system that we developed for the NIST RT09s evaluation. In the first pass of our system a model for speech overlap detection is gen erated automatically. This model is used in two ways to reduce the diarization errors due to overlapping speech. First, it is used in a second diarization pass to remove overlapping speech from the data whi...
متن کاملpyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems
pyannote.metrics is an open-source Python library aimed at researchers working in the wide area of speaker diarization. It provides a command line interface (CLI) to improve reproducibility and comparison of speaker diarization research results. Through its application programming interface (API), a large set of evaluation metrics is available for diagnostic purposes of all modules of typical s...
متن کامل